Empowering Adaptive Early-Exit Inference with Latency Awareness

نویسندگان

چکیده

With the capability of trading accuracy for latency on-the-fly, technique adaptive early-exit inference has emerged as a promising line research to accelerate deep learning inference. However, studies in this commonly use group thresholds control accuracy-latency trade-off, where thorough and general methodology on how determine these not been conducted yet, especially with regard common requirements average latency. To address issue enable latency-aware inference, present paper, we approximately formulate threshold determination problem finding accuracy-maximum setting that meets given requirement, then propose method tackle our formulated non-convex problem. Theoretically, prove that, certain parameter settings, finds an approximate stationary point Empirically, top various models across multiple datasets (CIFAR-10, CIFAR-100, ImageNet two time-series datasets), show can well handle requirements, consistently good settings negligible time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Low Latency RNN Inference with Cellular Batching

Performing inference on pre-trained neural network models must meet the requirement of low-latency, which is often at odds with achieving high throughput. Existing deep learning systems use batching to improve throughput, which do not perform well when serving Recurrent Neural Networks with dynamic dataflow graphs. We propose the technique of cellular batching, which improves both the latency a...

متن کامل

Early exit: Estimating and explaining early exit from drug treatment

BACKGROUND Early exit (drop-out) from drug treatment can mean that drug users do not derive the full benefits that treatment potentially offers. Additionally, it may mean that scarce treatment resources are used inefficiently. Understanding the factors that lead to early exit from treatment should enable services to operate more effectively and better reduce drug related harm. To date, few stud...

متن کامل

Automated Inference with Adaptive Batches

Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it di cult to use them for adaptive stepsize selection and automatic stopping. We propose alternative “big batch” SGD schemes that adaptively grow the batch size ove...

متن کامل

Predicting Survival of Patients with Lung Cancer Using Improved Adaptive Neuro-Fuzzy Inference System

Introduction: Lung cancer is the main cause of mortality in both genders worldwide. This disease is caused by the uncontrollable growth and development of cells in both or one of the lungs. Although the early diagnosis of this cancer is not an easy task, the earlier it is diagnosed, the higher will be the chance of treating. The objective of this study was to develop an optimized prediction mod...

متن کامل

Adaptive Latency Insensitive Protocols and Elastic Circuits with Early Evaluation: A Comparative Analysis

Latency Insensitive Protocols (LIP) and Elastic Circuits (EC) solve the same problem of rendering a design tolerant to additional latencies caused by wires or computational elements. They are performance-limited by a firing semantics that enforces coherency through a lazy evaluation rule: Computation is enabled if all inputs to a block are simultaneously available. Adaptive LIP’s (ALIP) and EC ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i11.17181